Open Source Speech and Language Resources for Frisian

نویسندگان

  • Emre Yilmaz
  • Henk van den Heuvel
  • Jelske Dijkstra
  • Hans Van de Velde
  • Frederik Kampstra
  • Jouke Algra
  • David A. van Leeuwen
چکیده

In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech database containing radio broadcasts, a phonetic lexicon with more than 70k words and a language model trained on a text corpus with more than 38M words. With this contribution, we aim to share the Frisian resources we have collected in the scope of the FAME! project, in which a spoken document retrieval system is built for the disclosure of the regional broadcaster’s radio archives. These resources enable research on code-switching and longitudinal speech and language change. Moreover, a sample automatic speech recognition (ASR) recipe for the Kaldi toolkit will also be provided online to facilitate the Frisian ASR research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Frisian TTS, an example of bootstrapping TTS for minority languages

A Frisian adaptation of a Dutch TTS system based on Festival, NeXTeNS, is presented as a case study in prototyping TTS for resource-poor minority languages. For these languages, demonstrator systems are essential to seed projects in speech and language technology. The conversion of a Dutch TTS system to a new language with minimal speech and language resources, Frisian, demonstrates that a TTS ...

متن کامل

Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech

In this paper, we present a new longitudinal and bilingual broadcast database designed for speaker clustering and textindependent verification research. The broadcast data is extracted from the archives of Omrop Fryslân which is the regional broadcaster in the province of Fryslân, located in the north of the Netherlands. Two speaker verification tasks are provided in a standard enrollment-test ...

متن کامل

Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech

In this paper, a code-switching automatic speech recognition (ASR) system built for the Frisian language is described. Frisian is mostly spoken in the province Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. In the scope of the FAME! Pr...

متن کامل

Fryss: a First Step towards Frisian Tts

In this MA-project a Dutch TTS system based on Festival, NeXTeNS, has been changed step by step into a Frisian system. As for many minority languages, Frisian too has very few digital resources. So, the challenge of this project is to make the system as intelligible as possible with minimal resources. The resulting TTS system is called FRYSS. At the end of the thesis period an evaluation was ca...

متن کامل

Word and phrasal stress disentangled: Pitch peak alignment in Frisian and Dutch declarative structures

This paper investigates intonational pitch variations and pitch peak alignment in declarative sentences and is part of a larger study of declarative, interrogative and imperative grammatical constructions in the Frisian-Dutch contact situation. Frisian is a minority language spoken in the province of Fryslân in the Netherlands. Following Jun [19], we devised a reading task in which phrasal into...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016